Goto

Collaborating Authors

 Herisau


HiRes-FusedMIM: A High-Resolution RGB-DSM Pre-trained Model for Building-Level Remote Sensing Applications

arXiv.org Artificial Intelligence

Recent advances in self-supervised learning have led to the development of foundation models that have significantly advanced performance in various computer vision tasks. However, despite their potential, these models often overlook the crucial role of high-resolution digital surface models (DSMs) in understanding urban environments, particularly for building-level analysis, which is essential for applications like digital twins. To address this gap, we introduce HiRes-FusedMIM, a novel pre-trained model specifically designed to leverage the rich information contained within high-resolution RGB and DSM data. HiRes-FusedMIM utilizes a dual-encoder simple masked image modeling (SimMIM) architecture with a multi-objective loss function that combines reconstruction and contrastive objectives, enabling it to learn powerful, joint representations from both modalities. We conducted a comprehensive evaluation of HiRes-FusedMIM on a diverse set of downstream tasks, including classification, semantic segmentation, and instance segmentation. Our results demonstrate that: 1) HiRes-FusedMIM outperforms previous state-of-the-art geospatial methods on several building-related datasets, including WHU Aerial and LoveDA, demonstrating its effectiveness in capturing and leveraging fine-grained building information; 2) Incorporating DSMs during pre-training consistently improves performance compared to using RGB data alone, highlighting the value of elevation information for building-level analysis; 3) The dual-encoder architecture of HiRes-FusedMIM, with separate encoders for RGB and DSM data, significantly outperforms a single-encoder model on the Vaihingen segmentation task, indicating the benefits of learning specialized representations for each modality. To facilitate further research and applications in this direction, we will publicly release the trained model weights.


Design, Fabrication and Evaluation of a Stretchable High-Density Electromyography Array

arXiv.org Artificial Intelligence

The adoption of high-density electrode systems for human-machine interfaces in real-life applications has been impeded by practical and technical challenges, including noise interference, motion artifacts and the lack of compact electrode interfaces. To overcome some of these challenges, we introduce a wearable and stretchable electromyography (EMG) array, and present its design, fabrication methodology, characterisation, and comprehensive evaluation. Our proposed solution comprises dry-electrodes on flexible printed circuit board (PCB) substrates, eliminating the need for time-consuming skin preparation. The proposed fabrication method allows the manufacturing of stretchable sleeves, with consistent and standardised coverage across subjects. We thoroughly tested our developed prototype, evaluating its potential for application in both research and real-world environments. The results of our study showed that the developed stretchable array matches or outperforms traditional EMG grids and holds promise in furthering the real-world translation of high-density EMG for human-machine interfaces.


CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

arXiv.org Artificial Intelligence

Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations. Their performance tends to degrade when faced with even slight deviations in language usage, such as different domains or variations introduced by second-language speakers. It is intuitive to extend this observation to encompass dialectal variations as well, but the work allowing the community to evaluate MT systems on this dimension is limited. To alleviate this issue, we compile and release \dataset, a contrastive dialectal benchmark encompassing 882 different variations from nine different languages. We also quantitatively demonstrate the challenges large MT models face in effectively translating dialectal variants. We are releasing all code and data.